34 research outputs found
Decentralized Learning for Multi-player Multi-armed Bandits
We consider the problem of distributed online learning with multiple players
in multi-armed bandits (MAB) models. Each player can pick among multiple arms.
When a player picks an arm, it gets a reward. We consider both i.i.d. reward
model and Markovian reward model. In the i.i.d. model each arm is modelled as
an i.i.d. process with an unknown distribution with an unknown mean. In the
Markovian model, each arm is modelled as a finite, irreducible, aperiodic and
reversible Markov chain with an unknown probability transition matrix and
stationary distribution. The arms give different rewards to different players.
If two players pick the same arm, there is a "collision", and neither of them
get any reward. There is no dedicated control channel for coordination or
communication among the players. Any other communication between the users is
costly and will add to the regret. We propose an online index-based distributed
learning policy called algorithm that trades off
\textit{exploration v. exploitation} in the right way, and achieves expected
regret that grows at most as near-. The motivation comes from
opportunistic spectrum access by multiple secondary users in cognitive radio
networks wherein they must pick among various wireless channels that look
different to different users. This is the first distributed learning algorithm
for multi-player MABs to the best of our knowledge.Comment: 33 pages, 3 figures. Submitted to IEEE Transactions on Information
Theor
Mechanism Design for Demand Response Programs
Demand Response (DR) programs serve to reduce the consumption of electricity
at times when the supply is scarce and expensive. The utility informs the
aggregator of an anticipated DR event. The aggregator calls on a subset of its
pool of recruited agents to reduce their electricity use. Agents are paid for
reducing their energy consumption from contractually established baselines.
Baselines are counter-factual consumption estimates of the energy an agent
would have consumed if they were not participating in the DR program. Baselines
are used to determine payments to agents. This creates an incentive for agents
to inflate their baselines. We propose a novel self-reported baseline mechanism
(SRBM) where each agent reports its baseline and marginal utility. These
reports are strategic and need not be truthful. Based on the reported
information, the aggregator selects or calls on agents to meet the load
reduction target. Called agents are paid for observed reductions from their
self-reported baselines. Agents who are not called face penalties for
consumption shortfalls below their baselines. The mechanism is specified by the
probability with which agents are called, reward prices for called agents, and
penalty prices for agents who are not called. Under SRBM, we show that truthful
reporting of baseline consumption and marginal utility is a dominant strategy.
Thus, SRBM eliminates the incentive for agents to inflate baselines. SRBM is
assured to meet the load reduction target. SRBM is also nearly efficient since
it selects agents with the smallest marginal utilities, and each called agent
contributes maximally to the load reduction target. Finally, we show that SRBM
is almost optimal in the metric of average cost of DR provision faced by the
aggregator
Approachability in Stackelberg Stochastic Games with Vector Costs
The notion of approachability was introduced by Blackwell [1] in the context
of vector-valued repeated games. The famous Blackwell's approachability theorem
prescribes a strategy for approachability, i.e., for `steering' the average
cost of a given agent towards a given target set, irrespective of the
strategies of the other agents. In this paper, motivated by the multi-objective
optimization/decision making problems in dynamically changing environments, we
address the approachability problem in Stackelberg stochastic games with vector
valued cost functions. We make two main contributions. Firstly, we give a
simple and computationally tractable strategy for approachability for
Stackelberg stochastic games along the lines of Blackwell's. Secondly, we give
a reinforcement learning algorithm for learning the approachable strategy when
the transition kernel is unknown. We also recover as a by-product Blackwell's
necessary and sufficient condition for approachability for convex sets in this
set up and thus a complete characterization. We also give sufficient conditions
for non-convex sets.Comment: 18 Pages, Submitted to Dynamic Games and Application